Skip to content

Language Reference

There are four core parts to the scheme language:

  • The syntax for describing syntax datums, more commonly known as s-expressions. This describes how code and data are written as text.
  • The syntax for defining libraries. In their essence, libraries define a set of symbols to "import" (bring into their scope from other libraries) and "export" (provide upon being imported by other libraries).
  • A set of core libraries that provide syntax and functions allowing one to write programs.
  • A dynamic type system that describes the types of scheme values at runtime.

This system is incredibly malleable; importing symbols, including the ones most commonly associated with scheme's semantics, such as define, set! or even lambda is entirely optional. New languages can be defined as libraries, allowing for an unprecedented amount of flexibility.

We will go over each of these core parts briefly. For a more in depth overview of the precise semantics of the language, please refer to the r6rs specification, of which this document is a highly abridged and paraphrased facsimile.

Syntax

Lexical syntax

The lexical syntax of Scheme determines how characters are split into sequence of tokens, also known as "lexemes". Lexemes are separated by delimiters, which include any amount of whitespace along with the characters (, ), [, ], ", ;, and #. The following lists all of the possible classifications of lexemes:

Booleans

The values of True and False are represented by the lexemes #t (or #T) and #f (or #F) respectively.

Identifiers

Identifiers are strings of characters that must start with any character that is not a delimiter and not a number. Practically, that means that an identifier starts with a letter or special character like $ or <.

The following are all examples of identifiers:

<=
a->b
hello-world
goodbye_world
foo/bar
$$$12343$$$

Scheme is much more permissive with what characters can be included in identifiers than other languages. Thus underscores are typically discouraged for naming multi-word variable names, instead using hyphens in what is typically called kebab case.

dont_do_this
do-this-instead

Numbers

Scheme provides the ability to specify a much wider collection of numerical literals than other programming languages:

1234       ; Typical base-ten digit numeral
-53        ; Negative number
+54        ; Positive number
3.1415926  ; Floating point numbers
#b0101010  ; Binary number
#o0237777  ; Octal number
#xdeadbeef ; Hexadecimal number
7/22       ; Rational number
+5.2+7.3i  ; Complex number
+inf.0     ; Positive infinity
-inf.0     ; Negative infinity
+nan.0     ; Not A Number (NaN)

Characters

Unicode code points (also known as characters) are represented by the prefix #\ followed by the character literal, a special character name, or x followed by a hexadecimal scalar value. Valid special character names include nul, alarm, backspace, tab, linefeed, newline, vtab, page, return, esc, space, and delete.

#\a         ; Lower case letter a
#\A         ; Upper case letter A
#\(         ; Left parenthesis
#\linefeed  ; U+000A
#\newline   ; Same as #\linefeed but considered depricated
#\λ         ; U+03BB

Strings

Strings are formatted pretty similarly to other programming languages; they are a string of characters surrounded by two double quotes ("). \ is used to escape " and various other escape sequences. A \ at the end of line can be used to escape whitespace between the current line and the next:

"hello, world!"
"hello?\nworld!"
"A
bc"               ; This is U+0041, U+000A, U+0062 and U+0063
"A\
bc"               ; This is U+0041, U+0062 and U+0063

Comments

Comments in Scheme come in three different flavors:

Line comments

Single line comments are indicated with the semicolon (;) character. The comment extends to the end of the line:

(+ 1 2 3 ; ) 4 5 6
   4) ; => returns 10
Block comments

Block comments are delimited by pairs of #| and |# characters. They can be nested:

#|
    This is a block comment
    #|
        This is an inner block comment
    |#
|#
Datum comments

The #; prefix can be used to comment out whole datums. Here is an example that shows every type of comment in action:

#|
    The FACT procedure computes the factorial
    of a non-negative integer.
|#
(define fact
  (lambda (n)
    ;; base case
    (if (= n 0)
        #;(= n 1)
        1       ; identity of *
        (* n (fact (- n 1))))))

Datum syntax

The datum syntax is a description of how Scheme s-expressions are represented in terms of sequence of lexemes. There are three components to the datum syntax:

  • Pairs and lists, enclosed by ( ) or [ ]
  • Vectors, enclosed by #( )
  • Bytevectors, enclosed by #vu8( )
  • Non-standard datums, such as hashtable literals

We will go through each of these one-by-one:

Pairs and lists

The most fundamental datums are pairs and lists, the most basic of which is () which represents the empty list. () only has one value and it is itself.

Pairs can be represented via dot notation, i.e. (⟨datum1⟩ . ⟨datum2⟩). The first field is called the "car" (more commonly known as the "head") and the second is called the "cdr" (more commonly known as the "tail).

Lists are constructed from multiple pairs recursively in their cdr fields. For example,

(a b c d e)
is equivalent to
(a . (b . (c . (d . (e . ())))))

A list is considered "proper" if the final cdr is the empty list. For example,

(a b c d . e)
is equivalent to
(a . (b . (c . (d . e))))
and is not considered proper since the final cdr is the symbol e rather than the empty list ().

Vectors

Vectors, also known as arrays, are represented with the notation #(⟨datum⟩ ...). For example, a vector of length four that contains the number zero at index zero, a pair of two numbers and index one, and a string at index three could be represented as follows:

#(0 (1 . 2) "Alice")

Bytevectors

Similar to vectors, bytevectors are arrays, but they can only contain values that can fit in a single unsigned 8-bit byte. For example, a bytevector of length three containing the values 1, 2, and 255 could be represented as follows:

#vu8(1 2 255)

Library syntax

Libraries provide a syntax for importing and exporting symbols.

Libraries have the following form:

(library (name ... version?) 
  (export export-spec ...)
  (import import-spec ...)

  body)

The ⟨name⟩ of the library is a list of symbols, and should match with the location of the library in the filesytem. For example, a library named (foo bar baz) should be located at foo/bar/baz.sls.

The optional ⟨version⟩ of a library is either null or a list of integers that specify the semantic version of the library.

⟨export-spec⟩ is either a symbol specifying a variable to be exported or a datum of the form (import ⟨import-spec⟩). Exports of the latter form export all values included in the import.

An ⟨import-spec⟩ has one of the following forms:

  • ⟨library-reference⟩
  • (library ⟨library-reference⟩) Allows for importing of libraries that include the words "only", "except", "prefix", or "rename".
  • (only ⟨import-spec⟩ ⟨identifier⟩ ...) Imports only the identifiers specified from the import spec.
  • (except ⟨import-spec⟩ ⟨identifier⟩ ...) Imports all of the identifiers from the import spec except for the ones specified.
  • (prefix ⟨import-spec⟩ ⟨identifier⟩) Prefixes all of the identifier in the import spec with the provided identifier.
  • (rename ⟨import-spec⟩ (⟨identifier1⟩ ⟨identifier2⟩) ...) Renames each identifier in the import spec that matches the nth car in the provided list with the nth cdr.

Type system

Scheme is dynamically typed, meaning the type of value is determined at run time and not at compile type.

Scheme values can have at most one type, of the following categories:

  • Null: Can only be one possible value which is itself. Commonly known as the unit type.
  • Pair: A collection of two values.
  • Boolean: Can either be true or false.
  • Character: A unicode code point.
  • Number: A numerical value on the numerical tower.
  • String: An array of unicode code points.
  • Symbol: A symbol. Conceptually similar to an immutable string. Symbols are interned so that symbols with the same spelling always satisfy eq?.
  • Vector: An array of values.
  • Byte-vector: An array of bytes.
  • Syntax: Value containing a representation of the datum syntax, including source code information.
  • Procedure: A scheme procedure, more commonly known as a closure.
  • Record: A record.
  • Record Type Descriptor: A description of a record's type.
  • Hashtable: A hash table.
  • Port: A value that can handle input/output from the outside world.

Numeric tower

Numbers can be any member of increasingly larger sets, of which there are the following:

  • Integers, which is a subset of
  • Rationals, which is a subset of
  • Reals, which is a subset of
  • Complex Numbers of which all numbers are a member of.

In scheme-rs, integers are represented with either 64-bit signed integers or big numbers, depending on their size. Rationals are represented as two big numbers. Reals are represented via a 64-bit IEEE float point number. Complex numbers are composed of two simple numbers, which can be any of the numeric types previously listed.